arXiv:l506.04505v3 [cs.DS] 29M2015 


Applications of Uniform Sampling: 
Densest Subgraph and Beyond* 


Hossein Esfandiari^ 
University of Maryland 


MohamnradTaghi Hajiaghayi^ 
University of Maryland 


David P. Woodruff 
IBM Almaden 


July 30, 2015 


Abstract 

Recently [Bhattacharya et al., STOC 2015] [5] provide the first non-trivial algorithm for the 
densest subgraph problem in the streaming model with additions and deletions to its edges, i.e., 
for dynamic graph streams. They present a (0.5 — e)-approximation algorithm using O(n) space, 
where factors of e and log(n) are suppressed in the O notation. However, the update time of 
this algorithm is large. To remedy this, they also provide a (0.25 — e)-approximation algorithm 
using 0(n) space with update time 0(1). 

In this paper we improve the algorithms in [3] by providing a (1 — e)-approximation algorithm 
using 0{n) space. Our algorithm is conceptually simple - it samples 0(n ) edges uniformly at 
random, and finds the densest subgraph on the sampled graph. We also show how to perform 
this sampling with update time 0(1). In addition to this, we show that given oracle access to 
the edge set, we can implement our algorithm in time 0{n) on a graph in the standard RAM 
model. To the best of our knowledge this is the fastest (0.5 — e)-approximation algorithm for 
the densest subgraph problem in the RAM model given such oracle access. 

Interestingly, we extend our results to a general class of graph optimization problems that 
we call heavy subgraph problems. This class contains many interesting problems such as densest 
subgraph, directed densest subgraph, densest bipartite subgraph, d-max cut, and d-sum-max 
clustering. Our results, by characterizing heavy subgraph problems, address Open Problem 
13 at the IITK Workshop on Algorithms for Data Streams in 2006 regarding the effects of 
subsampling, in the context of graph streams. 


‘This paper is available on arxiv since June 15, 2015 

^Supported in part by NSF CAREER award 1053605, NSF grant CCF-1161626, ONR YIP award N000141110662, 
DARPA/AFOSR grant FA9550-12-1-0423. 



1 Introduction 


In this paper we consider a general class of graph optimization problems that we call heavy subgraph 
problems in the streaming setting with additions and deletions, i.e., in dynamic graph streams. We 
show that many interesting problems such as densest subgraph, directed densest subgraph, densest 
bipartite subgraph, d-rnax cut, and d-sum-max clustering fit in this general class of problems. To 
the best of our knowledge, we are the first to consider densest bipartite subgraph and d-sum-max 
clustering in the streaming setting. We defer the definitions of these two problems to Subsections 
IP and 14.41 respectively. 

Finding the densest subgraph of a graph is one of the classical problems in computer science 
and appears to have many applications that deal with massive data such as spam detection |9j and 
analyzing communication in social networks [6]. 

In an instance of the densest subgraph problem we are given a graph G and want to find a 

subgraph sol C G , that maximizes \^/ ol \ , where V so i and E so i are the vertex set and the edge set 

of sol, respectively. Similarly, in an instance of the directed densest subgraph problem we are given 

a directed graph G = (' Vq , Eq) and want to find a pair A, B C Vq, that maximizes H , where 

V\ A \'\B\ 


E(A,B ) is the number of edges from A to B in Eq. 

Charikar [4] studies the densest subgraph problem in the classical setting and provides a 0.5- 
approximation algorithm with running time 0(n+m), where n and m are the number of vertices and 
edges, respectively. To the best of our knowledge, this is the fastest known constant approximation 
algorithm for the densest subgraph problem. Moreover, he provides a 0.5-approximation algorithm 
for the directed densest subgraph problem. 

Later, Bahmani, Kumar and Vassilvitskii [2], consider the densest subgraph problem and the 
directed densest subgraph problem in the streaming setting with only insertions of edges. For 
both problems, they present streaming algorithms with a yq^-approximation factor using log 1+e (n) 
passes over the input. To the best of our knowledge, their results for directed graphs are the only 
non-trivial results for the directed densest subgraph problem in the streaming setting, prior to our 
work. 

Recently, Bhattacharya et al. [3j present the first single pass streaming algorithm for dy¬ 
namic graph streams. Their first algorithm provides a (0.5 — e)-approximation using 0(n ) bits of 
space, though the update time is inefficient. They provide a second algorithm with a (0.25 — e)- 
approximation factor for which the update time is only 0(1), again using O(n) bits of space. 

In the d-max cut problem, we are given a graph G, and we want to decompose the vertices of G 
into d partitions such that the number of edges between different partitions is maximized. To the 
best of our knowledge, we are the first to consider this problem for general d in the streaming setting. 
A restricted version of this problem where d = 2, is the classic max cut problem. One can store a 
sparsifier [T| of the input graph in O(n) space, and preserve a (1 — ^-approximation of the max cut. 
However, it is not clear if sparsifiers preserve d -max cut or not. Recently, Kapralov, Khanna and 
Sudan show that any (1 — e)-approximation to the max cut problem in the streaming setting 
requires n space. 

In this paper, we refer to |y a ° i | as the density of the solution sol, and denote it by den(sol). We 
refer to the density of the densest subgraph of G by opt(G). We also denote the densest subgraph 
among all those on k vertices by opt(G,k). We sometimes abuse notation and use opt(G ) to refer 
to both the densest subgraph as well as its density. It is easy to see that the densest subgraph of 
a graph G is an induced subgraph of it, otherwise we can simply add more edges to it. Therefore, 


1 






we can indicate the densest subgraph by its vertex set. For a graph G, and a subset of vertices U, 
we denote the induced subgraph of G on vertex set U by G[U]. 

1.1 Our Results 

In this paper, we first consider the densest subgraph problem in the streaming setting where we 
have both insertions and deletions to the edges as they arrive in the stream, i.e., in a dynamic 
graph stream. We improve the result of Bhattacharya et al. (STOC’15) [3] by providing a (1 — e)- 
approximation algorithm for this problem using 0(n ) space. Indeed, our algorithm simply samples 
0(n) edges uniformly at random, and finds the densest subgraph on the sampled graph. We also 
achieve update time 0(1). To achieve this, we use min-wise independent hashing together with fast 
multi-point polynomial evaluation. 

Theorem 1.1. There exists a semi-streaming algorithm in dynamic graph streams for the densest 
subgraph problem with space Ofn ) which gives a ( 1 —e )-approximate solution, with probability 1 — 1 /n. 
The update time is 0(1). 

In addition, our algorithm can be implemented using an oracle that provides direct access to 
a uniformly sampled edge. This algorithm makes 0(n) queries to the oracle and finds the densest 
subgraph on a graph with n vertices and 0(n) edges. Therefore, by replacing the m in the running 
time of the algorithm of Charikar [4] with 0(n), we achieve 0(n) running time, which is tight up 
to logarithmic factors. To the best of our knowledge, this is the fastest constant approximation 
algorithm for the densest subgraph problem in the RAM model, given such an oracle. 

Theorem 1.2. Suppose we have oracle access to the edge set of the input graph with the ability to 
sample an edge uniformly at random. There exists an algorithm for the densest subgraph problem 
running in time 0(n), which gives a (0.5 — e)-approximate solution, with probability 1 — 1/n. 

Next, we extend our results to a general family of graph optimization problems that we call 
heavy subgraph problems. Interestingly, we show that by uniformly sampling edges we obtain 
enough information about the solution of any heavy subgraph problem. Since the solution of a 
heavy subgraph problem itself may be as large as the whole graph, here we just claim that we can 
estimate the size of the optimum solution. However, in some cases, like for the densest subgraph 
problem, it might be possible to also obtain the optimum solution itself, and not just the size, from 
the sampled graph. 

A graph optimization problem is defined by, an input graph G , a set of feasible solutions Sole , 
which are subgraphs of G, and an objective function / : Sol —?• R. In a graph optimization problem 
we aim to find a solution sol S Sole that maximizes /. In fact, the number of feasible solutions for a 
graph G may be exponential in the size of G. We say a graph optimization problem on graph G is a 
( 7 , l)-heavy subgraph problem if there exist l sets SoIq, SoIq, ..., Sol l G , such that Sole = G l k=1 Sol G 
and for any k: 

• Local Linearity: There are l numbers /1 > fi >•••>/; = 1, such that for any 1 < k < l 
and any solution sol G SoIq, we have f(sol) = fk • \E so i\, where E so i is the edge set of sol. 

• Hereditary Property: For any spanning subgraph H C G, we have solu € Sol^fj if and 
only if there exists a solution sola € SoIq such that solu = sole H H. 

• 7 Bound: 7 is chosen such that the optimum solution is lower bounded by ylogd Sol G \)fk^. 


2 


Let V{pf,1) be a heavy subgraph problem, and let Alg be an a-approximation algorithm for 
V. Algorithm [2] samples ()( ) edges of the input graph and runs Alg on the sampled graph. 

Interestingly, the following Theorem shows Algorithm [2] is an (a — e)-approximation algorithm for 
V on G. 

Theorem 1.3. Let ,1) be a heavy subgraph problem. Let G be an arbitrary graph G, and let 
Alg be an a-approximation algorithm for V. With probability 1 — e~ 5 , Algorithm [D is an ( a — e)- 
approximation algorithm for V on G, using 0( nSl ° e ^ ) space. 

Finally, in sectionlH we show several applications of Theorem I I ..'11 Indeed, we show that directed 
densest subgraph, densest bipartite subgraph, d-rnax cut and d-sum-max clustering all fits in the 
general family of heavy subgraph problems, and thus, Theorem 11,31 holds for them. 

Theorem 1.4. The following statements hold. 

• Densest bipartite subgraph is a (7 = Iog A +1 , l = n ) heavy subgraph problem. 

• Directed densest subgraph is a (7 = 2 y/Eiog(n) ’ ^ = n2 ) heavy subgraph problem. 

• d-max cut is a (7 = 2 i 0 g(d) ) ^ = 1 ) heavy subgraph problem. 

• d-sum-max clustering is a (7 = , l = 1 ) heavy subgraph problem. 

In fact, understanding the structure of the problems that can be solved using sampling, and 
specifically uniform sampling, is a well-motivated challenge, which was highlighted as a direction 
in the IITK Workshop on Algorithms for Data Streams in 2006. Our structural results, as well as 
our characterization for heavy subgraph problems, give partial answers to this open question in the 
context of graphs. 

In simultaneous and independent work McGregor et al. m present a single pass (1 — e)- 
approximation algorithm for the densest sugraph problem in the dynamic graph streaming model 
with update time 0(1), that uses O(n) space. 

2 Densest Subgraph 

In this section we analyze a simple algorithm for the densest subgraph problem. This algorithm 
simply samples O(n) edges uniformly at random (without replacement) and solves the problem on 
the sampled subgraph, where factors of 1/e, 6 and log(n) are hidden in the O notation. Interestingly, 
we show that, with probability 1 — m ~ s , this algorithm gives a (1 — e)-approximate solution using 
O(n) bits of space. 

In addition, this algorithm can be implemented with a running time of O(n), using oracle access 
to the edge set, with the ability to sample an edge uniformly at random. This works by combining 
our sampling guarantee with the algorithm of Charikar [1]. 

To the best of our knowledge, our (0.5 — e)-approximation algorithm is the fastest constant 
approximation algorithm for the densest subgraph problem. This algorithm can be implemented 
in the streaming setting with insertions and deletions to the edges (the strict turnstile or dynamic 
graph stream model), as well as the RAM model with oracle access to random edge samples (we 
discuss the latter model below in the context of improving the running time of the best offline 
algorithms). 


3 








Before stating our main lemma first we need the following generalized version of the Chernoff 
inequality, that holds for negatively correlated random variables. We say Boolean random variables 
x\,X 2 , ■ ■ ■ ,x r are negatively correlated if for any arbitrary subset S of {xi,X 2 , ■ ■ ■ ,x r }, and any 
arbitrary a € S we have Pr(a = l|V 6 e s_ a & = 1) < Pr(a = 1) (7). 

Lemma 2.1 ([15]). Let xi, X 2 , ■ ■ ■, x r be a sequence of negatively correlated boolean (i.e., 0 or 1 ) 
random variables, and let X = Ya=i x i ■ We have 

Pr (|A — E[A]| > eE[X]) < 3exp(-e 2 E[X]/3). 

To use the above generalized version of the Chernoff bound, we need to show that our random 
variables are negatively correlated. The following lemma becomes very useful to show the random 
variables are negatively correlated. 

Lemma 2.2. Let x\, X 2 , ■ ■ ■, x r be a sequence of Boolean random variables, such that, exactly t 
of them are chosen to be 1 uniformly at random. Then the random variables x\, X 2 , ■ ■ ■, x r are 
negatively correlated. 

Proof. Let S be an arbitrary subset of {xi, X 2 , • • •, x r } and let a be an arbitrary element of S. On 
the one hand, the probability that a = 1 is £. On the other hand, conditioned that for any element 
b € S \ {a}, we have 6=1; the probability that a = 1 is • Clearly, £ > , which 

gives us Pr(a = l[V{, G 5_ a 6 = 1) < Pr(a = 1). This means the random variables X\,X 2 ,... ,x r are 
negatively correlated. □ 


Algorithm 1 Finding Densest Subgraph 
Input: A graph G = (Eg, V). 

Output: A (1 — e)-approximation of the densest subgraph of G, w.pr. 1 — m -<5 . 

1: Set C = 12 »(4+gIog(m) 

2 : if |El < C then 

3: Find a densest subgraph of G using the algorithm of [4j (or any other algorithm). 

4: else 

5: Sample C edges uniformly at random, without replacement from G. 

6: Let H be the sampled graph. 

7: Find a densest subgraph of H. 


Lemma 2.3. Let G = (V,Eq) be a graph with vertex set V and edge set Eq■ Let H = (V,Eh) be 
the sampled graph in Algorithm [7] and let p = We have for any k, 

Pr ( opt(G , k) — denc(opt(H, k )) > eopt(G)) < 6 exp(fc • log(m) — opt(G) ^ 

where denc(opt(H,k )) is the density of the subgraph of G induced by the vertices of opt(H,k). 

Proof. Let x e be the random variable that indicates whether e exists in Eh or not, and let U be 
an arbitrary subset of V of size k. By definition, the number of edges in H[U] is Yl e eH\u] ^ = 


4 










Yhe^G[U) x z- Let us denote this summation by Xjj. Then, we have 
YjeeG[U] x e 


E [den(H[U])] = E 


\U\ 


E E eG G[C7] x e] _ T,eeG[U]P = ^ e ^[u] 1 = p den(G[U]) 


\U\ 


\U\ 


\U\ 


( 1 ) 


where the first and the last equalities are by definition of density, and the third equality is by 
definition of p. Lemma [2721 savs that random variables xi,X 2 , ■ ■ ■ j®|_e g | are negatively correlated. 
Thus, to bound Xjj, we can apply the following form of the Chernoff bound from Lemma 12.11 

Pr (|JX-E[X]| > e'E[X}) < 3exp(-e ' 2 E[X]/3) 


By setting e' = pke we have 


Pr IX- 


FfYl i ^ pkeopt(G)\ ^ p 2 k 2 e 2 opt(G) 2 E[X] 

E m|>--- j < 3exp(-- — 


< 3exp(— 

< 3exp(— 


p 2 k 2 e 2 opt(G) z 
12E[X] 
pke 2 opt(G ). 


12 


Using Equality []] 


On the other hand we have 

pkeopt(G) \ 


Pr |X - E[X}\ > 




„ . 1\X — E[X]| e 
pr I -- - k -> -opt(G) 


= Pr 


IX 

p k 

1 




Pr | -den(H[U]) — den(G[U})\ > -opt(G ) 

p 2 


Using Equality|T| 


Therefore, we have 

Pr f\-den(H[U]) — den(G[U])\ > ^-opt{G) \ < 3exp(— 

V P 2 J 12 

If we set U to be the vertex set of Opt(G, k ), Inequality [2] says that 

Pr (opt(G,k) - deriH(opt(G,k )) > -opt(G) \ < 3exp(— -——-) 

V P 2 / 12 

which immediately gives us 


(2) 


Pr (opt(G,k ) — -opt(H,k) > -opt(G) \ < 3 exp(— 

V P 2 J 12 


)■ 


( 3 ) 


5 




















On the other hand, Inequality [2] says that for each selection of U, with probability 1 — 
3exp (— pkc we can upper bound ^den(H[U]) by den(G[U]) + | opt{G ). As we have (™) 

such choices, applying a union bound we have 


Pr ( V U: \u\=k^den(H[U]) - den(G[U ]) > ^opt(G) ) < 3 


m 


exp(- 


pke 2 opt(G ), 


12 


If we set U to opt{H , k) we have 
1 


Pr 


opt(H,k ) — denc(opt(H, k)) > -opt(G ) < 3 

p 2 


exp(- 


pke 2 opt(G) 


12 


)• 


Therefore, by combining Inequalities [3] and 0 ] and applying the union bound we have 

pke 2 opt(G ), 


Pr ( opt{G , k ) — denc(opt(H, k )) > eopt(G )) < 3( 


+ 1 )exp(— 


12 


^ i. , pke 2 opt(G). 

< 3(m k + 1) exp(- i - 

pke 2 opt(G), 


< 6 exp (A: • log(m) — 


12 


( 4 ) 


□ 


In fact, the density of the densest subgraph of a graph G is at least as much as the density of 
G itself. Hence, one can lower bound opt(G) by the density of G which is —. The following states 
this fact. 


Fact 2.4. The density of the densest subgraph of G is at least where n is the number of vertices 
in G and m is the number of edges. 

The following theorem bounds the approximation ratio of Algorithm [I] 

Theorem 2.5. With probability 1 — m ~ s , Algorithm.{J\is a (1 — e)- approximation algorithm for the 
densest subgraph problem. 

Proof. Recall that Lemma [2.31 states for an arbitrary fixed k, with probability at least 1 — 6 exp(A: • 
log(m) — pkc ° P 2 t ^' G ' > ) we have opt(G, k) — denc(opt(H, k )) < eopt(G). Using a union bound this holds 
for all 1 < k < n, with probability 1 — Ylk=i 6 exp (k ■ log(m) — pke ° pti ' G ' > ). Thus, for some value of 
k. with probability 1 — n ■ 6 exp (k ■ log(m) — pke op ^ G ' > ) we have opt(G) — denc{opt{H)) < eopt(G), 
which means that Algorithm Q] returns a (1 — e)-approximation. Now, if we set p to 12n ( 4 +^) lQ g( m ) ^ 


6 












or equivalently set C in Algorithm [T] to : log(m) ^ wg have 


1 — n x 6 exp (A: • log(m) — 
= 1 — n x 6 exp (k ■ log(m) 


pke 2 opt(G ) 


12 ' 

12n(4 + 5) log(m) 


e 2 m 


> 1 

= 1 
= 1 
> 1 
> 1 
> 1 
= 1 


n x 6 exp(fc • log(m) — 


12n(4 + 6) log(m) 


e 2 m 


n x 6 exp(fc • log(m) — (4 + 5)k ■ log(m)) 
n x 6 exp(—(3 + 5)k ■ log(m)) 
exp(log(n) + 2 — (3 + S)k ■ log(m)) 
exp (—5 ■ k ■ log(m)) 
exp (—5 • log(m)) 


m 


—6 


ke 2 opt(G ), 


12 


x 


ke 2 m 
12 n ' 


From Fact 12.41 


Since e 2 > 6 


□ 

One can use Lo-samplers m to sample the C edges in Algorithm [ 1 ] in a dynamic graph stream. 
However, maintaining these Lq samplers may need an update time as large as C. In Lemma 12.91 
we show how to sample C edges with 0(1) update time using the notion of min-wise independent 
hash functions. 


Definition 2.6. Given e > 0, we say a hash function h : [l,n] —>• [l,n] is e-approximately 
t-min-wise independent on a subset X of { 1,2,... ,n} if for any Y C X such that |H| = t we 
have 

Pr(max h(y) < min h(z)) = -i-(l ± e). 
yeY zex-y 

l|y|J 

Theorem 2.7. (Theorem 1.1 of JSf) There exist constants c',c" > 1 such that for any X C 
{1,2,. ,.,n} of size at most en/c', any d’ (flog log(l/e) + \og(l/e))-wise independent family of func¬ 
tions h is e-approximately t-min-wise independent on X. 

In addition, one can evaluate h on t items simultaneously in total time t ■ (log (t/e))°^ by 
using fast multipoint polynomial evaluation (see, e.g., Theorem 13 of 112 j . where this idea was 
used for a different streaming problem). Further, one can spread the evaluation of h on t items 
evenly across the next t stream updates, converting this amortized (log (t/e)) 0 ^ update time to a 
worst-case update time. 

We need the following generalized version of the Chernoff bound. 

Lemma 2.8 ([16]). Let X be the sum of t-wise independent Boolean random variables. For any 
e > 1 such that t > \_e 2 E[X]e -1 / 3 J, we have 

Pr{\X -E[X}\ > eE[X]) < exp(-[e 2 E[X]/3j). 

Lemma 2.9. For any number C > n of edge samples and constant 1 < 5, we can sample C edges 
in a dynamic stream (a stream with insertions and deletions to the edges), such that the statistical 
distance of our sampled edges to a uniformly random (without replacement) sample of C edges is 
0(e~ s ). This sampling algorithm uses O(C) space and has update time 0(1). 


7 









Proof. We apply Theorem 12.71 with the n of that theorem equal to ( 2 ) c V e > where e is set to e~ s . 
We label each possible edge of our graph with a number in {1, 2,... , Q)} and extend the domain 
and range of h to {1,2,... , n 2 c'/e}. Then, we apply the X of Theorem 12.71 to the specific subset 
of m < (g) edges in our input graph. We apply Theorem 12.71 with t = O(Cfi). Theorem 12.71 and 
the definition of min-wise independence imply that the statistical distance of the subset F of t 
minimum hash values of our edges under h is within e ” 5 from t uniformly random samples without 
replacement. Indeed, the probability of choosing any set Y is ,iL (1 ± e) rather than ttftt had we 


fl-W 

Mi'll 


had full independence, so summing the absolute values of these differences gives ^-distance at most 
e = e~ 5 , and so statistical distance at most e~ 5 /2 between the distributions. Note that Theorem 
12.71 implies that h is also 0(C(5)-wise independent (in the standard sense, not the min-wise sense), 
and so h is Vt(n)- wise independent using our assumptions that C > n and S > 1 . 

Now we show how to maintain the C edges with the smallest hash values under h in a dynamic 
stream. We use log(n 2 ) sparse recovery data structures, sp\, sp 2 , ■ ■ ■ sp 2 iog(n) each to recover 9 6C 
edges. Note that n 2 is an upper bound on the total number of distinct edges in our graph. Upon 
the update (insertion or deletion) of an edge e, we update each sparse recovery structure s t such 
that i € [1, [log 2 (/i(e))J]. For the sparse recovery structure, we use the data structure of [[ 10 ] , which 
has 0(1) update time, space 0(C) and succeeds with probability 1 — 1 /n 2 in returning all of the 
non-zero items in a vector for which it is applied to, provided this number of non-zero items is at 
most 9(50. 

Let Z{ be the number of distinct edges which hash under h to the z-th sparse recovery data 
structure spi . Then if m > 4<5C, there always exists one of the spi such that 4(50 < E \Zf\ < 8(50 
(if m < 4(50, we can store the entire graph in 0(0) bits of space). Moreover, Z % for any given i 
is fairly concentrated around its expectation, since Theorem 12.71 implies that h is also 0(O(5)-wise 
independent, so we can bound its deviation using the generalized version of the Chernoff bound 
given by Lemma 12.81 Consider any Z % for which 4(50 < E [Zi\ < 8(50. Then we have 


Pr (Zi < 3(50 or Z { > 9<50) < PrflZj 

< exp(— 


- E[Zi}\ > 5C) < Pr(|Zj - E\Zi]\ > l E[s]) 

O 


A\2 ^[Zi] 

[ 8 J 3 


) < exp(— 0 ( 0 )) 


Using that O > n, the error probability is exp(— n). We also run an £o- es ti ma ti° n algorithm, 
to estimate each Zi up to a multiplicative factor of 1 . 1 , with total space 0 ( 1 ) and update time 
0 ( 1 ) 0 , and with failure probability l/n^ 2 *- 1 ). it follows from the above that for an i for which 
4(50 < E [Zi] < 8(50, we have that Zi € [3(50,9(50] with probability 1 — exp(—n). It follows that 
our ^o _es tima,te for this value of Zj will be in [0,10(50] with probability 1 — l/n^ 1 ). Hence, from 
spi, with probability 1 — we will recover all values that hash to spi, and as argued above 

these are within statistical distance e -<5 from uniform. □ 


Combining Lemma 12.91 with Theorem 12.51 proves Theorem ll.il 

To the best of our knowledge, the fastest known 0.5-approxinration algorithm for the densest 
subgraph problem has a running time of 0(m + n ). However, here we need to find the densest 
subgraph of a sampled graph with at most O = 0(n) edges. Thus, the running time of our algorithm 
is 0(n). To the best of our knowledge, this is the fastest (0.5 — e)-approximation algorithm for the 
densest subgraph problem. Theorem 11.21 states this fact. 







3 A General Family of Problems 


Here, we extend our results to the heavy subgraph problems. Specifically, we show that given an 
offline a-approximation algorithm for a ( 7 , £)-heavy subgraph problem, V('y,l), Algorithm [2] is an 
(a—e)-approximation for V(p(, l ). Later, in Section0J we show several applications of this algorithm. 
In this section, we denote the solution in SoIq that maximizes / by opt(G,k). 

Algorithm 2 A General Algorithm 

Input: A graph G, a heavy subgraph problem V{pf,1) and an a approximation algorithm Alg for 

V. 

Output: An a — e estimator of V on graph G, w.pr. 1 — 

1 : Set C = 12 ,t( 4 +g log(f) 

2: if I.EI < C then 
3: Return Alg(G). 

4: else 

5: Sample C edges uniformly at random, without replacement from G. 

6 : Let H be the sampled graph. 

7: Return ^Alg(H). 


The following lemma is the generalized version of Lemma 12.31 

Lemma 3.1. Let G be the input graph and let V{pf ,1) be a heavy subgraph problem and let Alg be 
an a-approximation algorithm for problem V. Let H = (V, Eh) be the sampled graph by Algorithm 
[1 and let p = ^. We have 

Pr (aopt(G,k) - f G (Alg(H ))) > eopt(G )) < 6exp(log(|5o/G|) - P€ ): 

where f G (Alg(H)) is the objective value of G on a solution sole such that Alg(H) = sol G n H. 

Proof. We prove this lemma in a similar way to that of Lemma 12.31 Again here we define x e to be 
the random variable that indicates whether e exists in Eh or not. However, here we let sole be 
an arbitrary solution from SoIq. Let soIh be a spanning subgraph of sole that contains the edges 
that appear in both sole an d H. The hereditary property says that soIh € Sol k H . 

By definition, the number of edges in soIh is 'Yh e&so i H 1 = S e eso/ G x e ■ We denote this summation 
by X. Using the local linearity property we have 

E [f(sol H )] = E [f k ^2 x e \ = f k 22 E t Xe ] = fk ^2 P = P- fk ^2 1 = P ’ f( sol c)- (5) 

eGsolc eGsolc e&Solc e&SolQ 

where the first and the last equalities are by Local Linearity, and the third equality is by definition 
of p. Again we can use Lemma [2.21 to claim that the random variables x±, X 2 , ■ ■ ■, x m are negatively 
correlated. Thus, to bound Xjj, we can apply the following form of the Chernoff bound from 
Lemma P 


Pr(|A-E[A]| >e'E[A]) < 3exp(-e /2 E[A]/3) 


9 








By setting e' = pe we l iave 


Pr (|X-E[X]|>?«)<3exp( 

< 3exp( 

< 3exp( 


p 2 e 2 opt(G) 2 E[X] 

4/| E[X] 2 ‘“T" 

p 2 e 2 opt(G) 2 
12/| E[X] j 

pe 2 opt(G) 

12 f k ’ 


Using Equality [5] 


On the other hand we have 

Pr (jX - E[X]| > = Pr (±fk\X - E[X]| > |qpi(G)) 

= Pr f|-/ fc X - - AE[X]| > 

\P P 2 / 

= Pr ^/(soZ#) - /(soZg)| > | opt(G) S j Using Equality[5] 


Therefore, we have 

Pr (jp so ^) ~ f( sol c )I > ^opt(G)^J < 3exp(- Pe ^2/P ) ( 6 ) 

If we set soZg to be the vertex set of Opt(G, k), Inequality [6] states that 

Pr (opt(G,k) - jJ H (opt(G,k)) > ^opt(G)^j < 3exp(- P£ ) 

which immediately gives us 


Pr (opt(G, k ) - ^opt(H, k) > ^opt(G)^j < 3exp(- P€ ( 7 ) 

On the other hand, Inequality [ 6 ] states that for each selection of sole, with probability 1 — 
3exp (— pe gg G > ), we can upper bound ^f(soln) by f(sol G ) + | opt(G ). Indeed, we have |5oZ 
such choices. Thus, by applying a union bound we have 

Pr ( ^sol G eSol%^f( sol H ) - /(soZ G ) > |qpi(G)^ < 3|5oZ^| exp(-p^jp)- 

If we select soZg, using the hereditary property, such that solu = Alg(H), we have 

Pr (^Alg(H) - f G (Alg{H)) > ^opt(G)\ < 3\Sol k G \exp(-^0^-). 


10 

















Now, given that Alg(H) > aOpt(H), we have 


Pr f-opt(H) --f G (Alg(H)) > opt(G)) < 3\Sol^\ exp(- P£ ( 8 ) 

\p a a 2 J 12 j k 

Therefore, by combining Inequalities [7] and [5] and applying the union bound we have 

Pr f opt(G,k ) - - f G (Alg{H))) > -eopt(G) \ < 3(|SoZe| + l)exp(- P£ 
y cr ot J 12jfc 

< 6 exp(log(|S’o/£,|) - EL20&) 

□ 


Now, we are ready to prove Theorem 11.31 

proof of Theorem \1.3l Lemma 13.II together with Equality [5] imply that for each k, with probability 
at least 1 — 6 exp(log(|So/^ ; |) — pc ) we h ave &opt(G,k ) — ^Alg(H)) > eopt(G). By a union 
bound, this holds for all 1 < k < Z, with probability 1— ^L=i 6exp(log(|5o/g|) — pe Thus, for 

some k with probability 1 — 6 / -exp(log(|5o/g|) — pe 1 °^ G ' > ) we have aopt(G) — ^Alg(H)) > eopt(G), 
which means that Algorithm [2] outputs a (1 — e)-approximation. 

Now, if we set p to , or equivalently set C in Algorithm [2] to 12 n ^ 4 ^j log ^ , we have 


1 — 6 Z • exp(log(|So/(j|) — 
= 1 — 6 / • exp(log(|5o/g|) 


pe 2 opt(G) 

12 f k 1 

12n(4 + d) log(Z) e 2 opt(G) 

7 £ 2 m 12/fc 


) 


< 1 - 6/ • exp(log(|S , oZg|) - r ^ k ^ log ^ Sol G\)f^> 

= 1 - 6 / • exp(log(|S , o/g|) - (4 + 5) log(Z) logGSc^l)) 

< 1 - exp(log(Z) + 2 + log(\Sol G \) - (4 + 5) log(Z) logOSc^l)) 

< 1 - exp(-8\og(l)log(\Sol G \)) 


= 1 — e 


-S 


From 7 Bound 


□ 


4 Applications 

There are several problems that fit into the class of heavy subgraph problems. Some examples are 
densest bipartite subgraph, directed densest subgraph, cZ-max cut, and cZ-sum-max clustering. In 
this section we define each of these problems and prove that each satisfies the properties required 
of a heavy subgraph problem. 


11 













4.1 Densest Bipartite Subgraph 

In the densest bipartite subgraph problem, we are given a graph general G and we aim to find a 
bipartite subgraph sol of G with the maximum density. Let opt be a densest bipartite subgraph of 
G, with parts A opt and B opt . Then opt contains all edges of G that are between A opt and B opt . We 
call such a subgraph feasibly maximal and without loss of generality, restrict all of the solutions to 
be feasibly maximal. In fact, we can indicate a feasibly maximal solution sol by its two parts A so i 
and B soi . 


proof of First part of Theorem \1.A\ For normalization purposes, we increase the value of the ob- 
jective function by a factor of n. Without loss of generality define the density to be n • . 

We set l = n , and for any 1 < k < l, we let Sol G be the set of all solutions sol such that 
\A so i\ + | B so i | = n - k + 1 . Thus, we have \Sol^\ = ( n _ n k+1 )2 n ~ k+1 . 

Local Linearity: The density of a solution sol in SoIq is n = \E so i\ n _fc + i • Thus, we can 

set fk = w _^, +1 , which is increasing in k and we have f\ = = 1 , as desired. 

Hereditary Property: Let H be a spanning subgraph of G. For any solution sol E SoIq , the 
intersection of sol and H remains bipartite and feasibly maximal, and thus, is a solution in SoIh- 
Moreover, by definition, the number of vertices of this intersection is the same as sol. Thus, it 
belongs to Sol k H . On the other hand, for any solution sol E Sol h H with parts A so i and B so i, let 
sol' be the bipartite maximal subgraph of G between the partitions A so i and B so i. By definition, 
sol' E SoIq and clearly sol' satisfies sol = sol' n H. 

7 Bound: In fact, G contains a bipartite subgraph that contains at least y edges. The den¬ 
sity of this subgraph is at least n^y 1 = y. On the other hand we have, fk = n Jk+i an d 
\S°l k G \ = ( n _fc +1 )2 n - fc+1 . Thus, if we set 7 to log( ^ )+1 , we have 


m 


7 log(|S<|)/ fc - = 

< 


n log(n)+ 1 
2 


log( 


n 

n — k + 1 


}7i—fc+l\ 


n 


m 


n — k + 1 n 


< 


log(n)+ 1 
2 

log(n)+ 1 


m 

= y < op*. 


((n — k + 1 ) log(n) + (n — k + 1 )) 
(log(n) + l)m 


n 


m 


n — k + 1 n 


□ 


4.2 Directed Densest Subgraph 

In the directed densest subgraph problem we are given a directed graph G , we want to find two not 

I F'f A I 

necessarily disjoint sets A, B C Vg, to maximize y== , where, E(A,B ) is the set of all edges 
(it, v) E Eg, such that it E A and v € B. 


proof of Second part of Theorem \l.f\ For normalization, we increase the objective function by a 
factor of n and define it as n-^==. For simplicity, here we index the solution sets using a pair 

of indices i and j. Here, Sol l G contains any solution sol = ( A,B) such that i = |H| and j = \B\. 
Thus, we have \So1 1 q\ = (”) (“), and l = n 2 . 


12 


















Local Linearity: In fact, for a solution sol = ( A, B ) € SoIq , the directed density of sol is 
n ^g| = Thus, we can define, f id = and we have min itj (f itj ) = 

/' = 1 , as desired. 

y/n-n 

Hereditary Property: For any spanning subgraph H C G, and any solution sol = ( A,B ) € 
SoI}q^^ B \ the same sets A and B indicate the intersection of H and sol , and thus is a solution in 
On the other hand, for any solution sol € Sol with sets A and B, let sol' be the 
solution on G corresponds to the sets A and B. By definition, sol' € Sol^’^ and clearly sol' 
satisfies sol = sol' n H. 

7 Bound: The directed density of the solution sol = (Vq,Vg) is n T n = m. Therefore, the 
optimum is lower bounded by m. On the other hand, for any i and j we have | SoI'q \ = (") (") and 
fi n = -2=. If we set 7 to - y=J- , , we have 

vT? ' 2y/n\og{n) 


7log(|s4 J '|)/y^=7log((" 

fl \ i 


n m 
yjl~i n 


< 7 (z • log (n)+j ■ log(n)) 


n m 

V^ 7 ! n 


i ■ log(n) + j ■ log(n) 

= 7-7=- ™ 

V* • 3 

.y/i ■ log(n) y/j ■ log(n) 

= 1{ —A~ + 

< 72\/nlog(n)m 

1 


2y / nlog(n) 
= m < opt. 


Vi 

2y/nlog(n)m 


)m 


□ 


4.3 cZ-Max Cut 

In the d-max cut problem, we are given a graph G and are supposed to mark the vertices using d 
labels to maximize the number of edges with different labels. 

proof of Third part of Theorem \ l.f\ Here we simply let all the solutions be in SoIq. Indeed, we 
have l = 1 and \SoIq\ = d n . 

Local Linearity: Clearly we have f\ = f) = 1. 

Hereditary Property: For any spanning subgraph H C G, and any solution sol € SoIq, the 
same labeling of sol on H gives us the intersection of sol and H. Thus, the intersection of sol and 
H is a solution in SoIh = Sol\j. On the other hand, similarly, for any solution sol £ Sol l H , the 
same labeling gives us a solution sol' € SoIq such that sol = sol' O H . 

7 Bound: Again here, if we just use 2 labels we have a solution with ^ value. Thus, we have 
opt > y- If we set 7 to 1 , .. we have 


7,06<|S<I)A= - 55^ 


, . m 1 . . m 

log (d — = ——— log(d)m = — < opt. 
n 2 log (a) 2 


□ 


13 

















4.4 (i-Sum-Max Clustering 

This problem is fairly similar to d-rnax cut. Again we are given a graph G and are supposed to 
mark the vertices using d labels. However, here we have to use all d colors and want to maximize 
the number of edges with the same labels. 

proof of Fourth part of Theorem Again, here we simply let all the solutions be in SoIq. So we 
have l = 1 and |5oZjk| < d n . 

Local Linearity: We have f\ = 1. 

Hereditary Property: Consider a spanning subgraph H C G, and let sol € SoIq be an arbitrary 
solution. Again, we can use the same labeling as sol on H to get the intersection of sol and H. 
Thus, the intersection of sol and H is a solution in Sol l H . On the other hand, again, for any solution 
sol € Soljj, the same labeling gives us a solution sol' € SoIq such that sol = sol 1 fl H. 

7 Bound: Suppose we choose d — 1 vertices uniformly at random and label them with labels 
1,2,..., d— 1 and label all the other vertices with d. Then the probability that one of the endpoints 
of a fixed edge is not labeled by d is at most 2^-4-. Thus, the expected number of edges in such 
a solution is m — 2m^- = m n ^ 2 ^ d ~ 1 ^ > m— —. Thus, the optimum solution has at least m n ~ 2d 
edges. Now, if we set 7 to n "~ g 2 ^ we have 


jlog(\SolQ\)f k — < 
n 


n-2d ,in\ m 

——— log(d ) — 
n log(a) n 


n — 2d n — 2d 

-- log(a)m = m - < opt. 

nlog(a) n 


□ 


References 

[1] Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Graph sketches: sparsification, span¬ 
ners, and subgraphs. In Proceedings of the 31st symposium on Principles of Database Systems , 
pages 5-14. ACM, 2012. 

[2] Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. Densest subgraph in streaming and 
mapreduce. Proceedings of the VLDB Endowment , 5(5):454~465, 2012. 

[3] Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos E 
Tsourakakis. Space-and time-efficient algorithm for maintaining dense subgraphs on one-pass 
dynamic streams. In STOC, 2015. 

[4] Moses Charikar. Greedy approximation algorithms for finding dense components in a graph. 
In Approximation Algorithms for Combinatorial Optimization , pages 84-95. Springer, 2000. 

[5] Graham Cormode, Mayur Datar, Piotr Indyk, and S. Muthukrishnan. Comparing data streams 
using hamming norms (how to zero in). IEEE Trans. Knowl. Data Eng., 15(3):529-540, 2003. 

[6] Yon Dourisboure, Filippo Geraci, and Marco Pellegrini. Extraction and classification of dense 
communities in the web. In Proceedings of the 16th international conference on World Wide 
Web, pages 461-470. ACM, 2007. 


14 









[7] Hossein Esfandiari, MohammadTaghi HajiAghayi, Mohammad Reza Khani, Vahid Liaghat, 
Hamid Mahini, and Harald Racke. Online stochastic reordering buffer scheduling. In Automata, 
Languages, and Programming, pages 465-476. Springer, 2014. 

[8] Guy Feigenblat, Ely Porat, and Ariel Shiftan. Exponential time improvement for min-wise 
based algorithms. In Proceedings of the twenty-second annual ACM-SIAM symposium on 
Discrete Algorithms, pages 57-66. SIAM, 2011. 

[9] David Gibson, Ravi Kumar, and Andrew Tomkins. Discovering large dense subgraphs in 
massive graphs. In Proceedings of the 31st international conference on Very large data bases, 
pages 721-732. VLDB Endowment, 2005. 

[10] Anna C. Gilbert, Yi Li, Ely Porat, and Martin J. Strauss. Approximate sparse recovery: 
optimizing time and measurements. In Proceedings of the 42nd ACM Symposium on Theory 
of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 475-484, 
2010. 

[11] Hossein Jowhari, Mert Saglam, and Gabor Tardos. Tight bounds for lp samplers, finding 
duplicates in streams, and related problems. In Proceedings of the 30th ACM SIGMOD- 
SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2011, June 12-16, 
2011, Athens, Greece, pages 49-58, 2011. 

[12] Daniel M. Kane, Jelani Nelson, Ely Porat, and David P. Woodruff. Fast moment estimation 
in data streams in optimal space. In Proceedings of the 43rd ACM Symposium on Theory of 
Computing, STOC 2011, San Jose, CA, USA, 6-8 June 2011, pages 745-754, 2011. 

[13] Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Streaming lower bounds for approx¬ 
imating max-cut. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on 
Discrete Algorithms, pages 1263-1282. SIAM, 2015. 

[14] Andrew McGregor, David Tench, Sofya Vorotnikova, and Hoa T. Vu. Densest subgraph in 
dynamic graph streams. In Mathematical Foundations of Computer Science 2015. Springer, 
2015. 

[15] Alessandro Panconesi and Aravind Srinivasan. Randomized distributed edge coloring via an 
extension of the chernoff-hoeffding bounds. SIAM Journal on Computing, 26(2):350-368, 1997. 

[16] Jeanette P Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoff-hoeffding bounds for 
applications with limited independence. SIAM Journal on Discrete Mathematics, 8(2):223- 
250, 1995. 


15 


